# Multimodal embedding
So400m Long
Apache-2.0
A vision-language model fine-tuned based on SigLIP 2, with maximum text length increased from 64 to 256 tokens
Text-to-Image
Transformers English

S
fancyfeast
27
3
Taxabind Vit B 16
MIT
TaxaBind is a multimodal embedding space model incorporating six modalities, focusing on ecological applications, supporting zero-shot classification of species images using taxonomic text categories.
Multimodal Fusion
T
MVRL
3,672
0
Featured Recommended AI Models